70 research outputs found

    Formal derivation of concurrent assignments from scheduled single assignments

    Get PDF
    Concurrent assignments are commonly used to describe synchronous parallel computations. We show how a sequence of concurrent assignments can be formally derived from the schedule of an acyclic single assignment task graph and a memory allocation. In order to do this we develop a formal model of memory allocation in synchronous systems. We use weakest precondition semantics to show that the sequence of concurrent assignments computes the same values as the scheduled single assignments. We give a lower bound on the memory requirements of memory allocations for a given schedule. This bound is tight: we define a class of memory allocations whose memory requirements always meets the bound. This class corresponds to conventional register allocation for DAGs and is suitable when memory access times are uniform. We furthermore define a class of simple ``shift register'' memory allocation. These allocations have the advantage of a minimum of explicit storage control and they yield local or nearest-neighbour accesses in distributed systems whenever the schedule allows this. Thus, this class of allocations is suitable when designing parallel special-purpose hardware, like systolic arrays

    Computing transitive closure on systolic arrays of fixed size

    Get PDF
    Forming the transitive closure of a binary relation (or directed graph) is an important part of many algorithms. When the relation is represented by a bit matrix, the transitive closure can be efficiently computed in parallel in a systolic array. Various such arrays for computing the transitive closure have been proposed. They all have in common, though, that the size of the array must be proportional to the number of nodes. Here we propose two ways of computing the transitive closure of an arbitrarily big graph on a systolic array of fixed size. The first method is a simple partitioning of a well-known systolic algorithm for computing the transitive closure. The second is a block-structured algorithm for computing the transitive closure. This algorithm is suitable for execution on a systolic array, that can multiply fixed size bit matrices and compute transitive closure of graphs with a fixed number of nodes. The algorithm is, however, not limited to systolic array implementations; it works on any parallel architecture that can form the transitive closure and product of fixed-size bit matrices efficiently. The shortest path problem, for directed graphs with weighted edges, can also be solved for arbitrarily large graphs on a fixed-size systolic array in the same manner, devised above, as the transitive closure is computed

    Data Cache Locking for Higher Program Predictability

    Get PDF
    ABSTRACT Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected. Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable. This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed. Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance

    Timing Analysis of Parallel Software Using Abstract Execution

    Get PDF
    Abstract. A major trend in computer architecture is multi-core processors. To fully exploit this type of parallel processor chip, programs running on it will have to be parallel as well. This means that even hard real-time embedded systems will be parallel. Therefore, it is of utmost importance that methods to analyze the timing properties of parallel real-time systems are developed. This paper presents an algorithm that is founded on abstract interpretation and derives safe approximations of the execution times of parallel programs. The algorithm is formulated and proven correct for a simple parallel language with parallel threads, shared memory and synchronization via locks

    Technical Report: Feedback-Based Generation of Hardware Characteristics

    Get PDF
    ABSTRACT In large complex server-like computer systems it is difficult to characterise hardware usage in early stages of system development. Many times the applications running on the platform are not ready at the time of platform deployment leading to postponed metrics measurement. In our study we seek answers to the questions: (1) Can we use a feedbackbased control system to create a characteristics model of a real production system? (2) Can such a model be sufficiently accurate to detect characteristics changes instead of executing the production application? The model we have created runs a signalling application, similar to the production application, together with a PIDregulator generating L1 and L2 cache misses to the same extent as the production system. Our measurements indicate that we have managed to mimic a similar environment regarding cache characteristics. Additionally we have applied the model on a software update for a production system and detected characteristics changes using the model. This has later been verified on the complete production system, which in this study is a large scale telecommunication system with a substantial market share

    Code Analysis for Temporal Predictability

    Full text link

    The WCET Tool Challenge 2011

    Get PDF
    Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series was organized in 2011, again with support from the ARTIST DESIGN Network of Excellence. Following the practice established in the previous Challenges, the WCET Tool Challenge 2011 (WCC'11) defined two kinds of problems to be solved by the Challenge participants with their tools, WCET problems, which ask for bounds on the execution time, and flow-analysis problems, which ask for bounds on the number of times certain parts of the code can be executed. The benchmarks to be used in WCC'11 were debie1, PapaBench, and an industrial-strength application from the automotive domain provided by Daimler AG. Two default execution platforms were suggested to the participants, the ARM7 as "simple target'' and the MPC5553/5554 as a "complex target,'' but participants were free to use other platforms as well. Ten tools participated in WCC'11: aiT, Astr\'ee, Bound-T, FORTAS, METAMOC, OTAWA, SWEET, TimeWeaver, TuBound and WCA

    Unfolding of Programs with Nondeterminism

    No full text
    In [13], some results were proved regarding properties of unfolding of purely functional programs. Especially, a theorem was shown that relates the termination of symbolic evaluation of a "less instantiated" term relative to the termination of a "more instantiated" term. An application is partial evaluation, where unfolding of function definitions is frequently performed to enable further simplifications of the resulting specialized program. The unfolding must then be kept under control to ensure that the partial evaluation terminates. In this paper, we extend the termination result from purely functional programs programs with nondeterministic operations. We give an operational semantics where the behaviour of operators is defined through rewrite rules: nondeterminism then occurs when the resulting term rewriting system is nonconfluent. For the confluent part, the previous termination results carry over. It is, however, not guaranteed in general that the resulting unfolded program has..

    Unfolding of Programs with Nondeterminism

    No full text
    In LIsper: "Total Unfolding: Theory and Applications" some results were proved regarding properties of unfolding of purely functional programs. Especially, a theorem was shown that relates the termination of symbolic evaluation of a "less instantiated" term relative to the termination of a "more instantiated" term. An application is partial evaluation, where unfolding of function definitions is frequently performed to enable further simplifications of the resulting specialized program. The unfolding must then be kept under control to ensure that the partial evaluation terminates. In this paper, we extend the termination result from purely functional programs programs with nondeterministic operations. We give an operational semantics where the behaviour of operators is defined through rewrite rules: nondeterminism then occurs when the resulting term rewriting system is nonconfluent. For the confluent part, the previous termination results carry over. It is, however, not guaranteed in general that the resulting unfolded program has the same semantics as the original program. We give conditions on the rewrite rules that guarantee that both versions have the same semantics, and we show that they apply to a nontrivial class of nondeterministic languages
    corecore